System for recruitment

ABSTRACT

A system for management of recruitment data may include (a) an interface for receiving and providing over a wide area computer network data regarding job openings and data regarding candidates to be matched to such job openings; (b) a database for storing the data regarding job openings and the data regarding the candidates, the database being organized according to one or more entity-relationship models; and (c) a computing hardware platform for executing a processing engine that is machine-learned from the data regarding job openings and the data regarding candidates, wherein the processing engine (a) creates the entity-relationship models over time; (b) manages the interface to receive the data regarding job openings and the data regarding candidates and causing the received data to be stored in the database; (c) matches candidates whose data are currently in the data base to job openings currently in the database; (d) receives historical data regarding actual filling of job openings in the database by candidates in the data base; and (e) refines the entity-relationship models and the matching of current candidates to current job openings based on the historical data.

CROSS REFERENCE TO RELATED APPLICATION

The present application claims priority from U.S. Provisional PatentApplication Ser. No. 62/211,569, filed on Aug. 28, 2015. The applicationis hereby incorporated by reference herein in its entirety

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a system for managing employmentinformation and for matching potential employees to job openings atemployers.

2. Discussion of the Related Art

In conventional recruitment practice, a recruiter spends significantportion of his or her time on the job sourcing and reviewing resumes ofpotential employees and matching the potential employees to theavailable jobs. From a potential employee's perspective, identifyingpotential employers with suitable positions and getting his or herresume into the appropriate channels to reach such potential employersare time-consuming and complex tasks. As most employers and employeesknow, the most qualified potential employees are often those who arealready in comfortable positions and are unlikely to be actively seekingthe next job.

Economists refer to the recruitment and job-seeking processes as a“two-sided matching” problem, with significant transactional costs(e.g., time, material and information costs) incurred in bringing thewell-matched employer and employee together. Thus, any tool thatautomates, simplifies or facilitates the process of identifying andmatching the desirable candidates to suitable job openings areeconomically significant.

SUMMARY

According to one embodiment of the present invention, a system formanagement of recruitment data includes (a) an interface for receivingand providing over a wide area computer network data regarding jobopenings and data regarding candidates to be matched to such jobopenings; (b) a database for storing the data regarding job openings andthe data regarding the candidates, the database being organizedaccording to one or more entity-relationship models; and (c) a computinghardware platform for executing a processing engine that ismachine-learned from the data regarding job openings and the dataregarding candidates, wherein the processing engine (a) creates theentity-relationship models over time; (b) manages the interface toreceive the data regarding job openings and the data regardingcandidates and causing the received data to be stored in the database;(c) matches candidates whose data are currently in the data base to jobopenings currently in the database; (d) receives historical dataregarding actual filling of job openings in the database by candidatesin the data base; and (e) refines the entity-relationship models and thematching of current candidates to current job openings based on thehistorical data.

In one embodiment of the present invention, the interface may includeone or more servers for maintaining one or more web portals for accessby users over the wide area computer network. One such web portals isone that is customized for use by recruiting professionals. In that webportal, a user can upload of job openings and candidate profiles, andreceives matching of candidates in the current data base with jobopenings in the current data base. Another one of such portals is a webportal customized for use by candidates to job openings. In addition toproviding a candidate's own profile information, the web portal for useby candidates may administer on-line technical competency tests andnon-technical surveys or questionnaires to the candidates. Parsers areprovided in the interface with the web portals to identify relevantinformation from the free form resumes and job descriptions.

According to one embodiment of the present invention, a system of thepresent invention may include a third party integration module forallowing data to be obtained or to be provided to third party programs.Such third party programs may include applicant tracking systems,candidate sourcing systems, and sources of professional and personaldata. Additional data regarding the candidates may be obtained fromthird party programs.

According to one embodiment of the present invention, a system of thepresent invention may include a web crawler that provides the systemdata regarding candidates through exploration of information availableon the wide area network.

Systems of the present invention provide more effective use of bothavailable and acquired data to evaluate how well a candidate matches aparticular job or role. According to one embodiment, data regarding acandidate collected through, for example, the candidate's curriculumvitae, data collected on-line from social and other online profiles andactivities, for example, are supplemented with data collected throughquestionnaires or competence testing of the candidate. Such a processprovides a direct evaluation of a candidate's skill qualifications andcultural fit. Using machine learning techniques to exploit deep andunapparent correlations among the data in a knowledge base, the signaland accuracy of how well a candidate will fit a particular job role maybe developed. At each step, data is collected and fed back into the coreengine to improve the accuracy of the candidate scoring.

The present invention is better understood upon consideration of thedetailed description below in conjunction with the accompanyingdrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows information flow diagram 100 which illustrates thecollection and analysis of data suitable for implementing such arecruitment tool, in accordance with one embodiment of the presentinvention.

FIG. 2 is a functional block diagram showing the major functionalmodules in system 200, in accordance with one embodiment of the presentinvention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

According to one embodiment of the present invention, a recruitment tool(“talent finder”) allows a user—who may be a recruiter or a hiringmanager—to evaluate a large number of candidates to specific jobrequirements. FIG. 1 shows information flow diagram 100 whichillustrates the collection and analysis of data suitable forimplementing such a recruitment tool, in accordance with one embodimentof the present invention. The recruitment tool includes entity knowledgebase 10 (also known as Entity Graph), which is a repository ofinformation or database containing data collected from a variety ofsources. As shown in FIG. 1, entity knowledge base 10 may include dataof a variety of related data categories, e.g., candidates 101, jobopenings 102, company profiles 103, school profiles 104 and other datacategories 105. The data within entity knowledge base 10 may beorganized in one or more entity-relationship models (“entity-basedknowledge graphs”), both within the data categories and across the datacategories.

As shown in FIG. 1, data in entity knowledge base 10 regardingcandidates for job openings may be sourced from resumes (represented inFIG. 1 by resumes 110) submitted by or collected from the candidates.For example, a user may upload one or more curricula vitae (“CVs”) orresumes. Such a user may be a recruiter or a job seeker. Typically,these documents are free-form. Thus, an automated tool (“resume parser111”) then parses the CVs or resumes for relevant information. Anotherautomated tool (“data extraction tool 112”) extracts the identifiedrelevant information and integrates the data into entity knowledge base10 according to how the extracted data fit into the entity-basedknowledge graphs. Optionally, data extraction tool 112 may also check ifrelevant information is collected of a given candidate and avoidsentering any duplicate information into entity knowledge base 10.

Similarly, a user may also upload one or more job descriptions (e.g.,job descriptions 113). Each job descriptions is then parsed by a jobdescription parser (“job parser 114”). The parsed job description isalso presented to data extraction tool 112, which extracts andintegrates the relevant job description information into knowledgeentity 10.

Information regarding the candidates may also be collected fromappropriate social and professional media websites or tools 115 (e.g.,Linked-in or Facebook). The candidates themselves may also be willing toprovide information outside of their CVs or resumes (e.g., throughsurveys or questionnaires). In some instances, it may be appropriate tocollect candidate information from broader sources (e.g., using a “datascraper” 119).

Based on the information collected and organized under the entity-basedknowledge graphs, and a set of predetermined evaluation criteria(“feature construction 120”), a machine learning-based program (“coreengine 121”) evaluates each candidate against each job opening toprovide a set of scores 122 representing how well the candidate matchesthe specific job requirements of the job opening. If the user desiresadditional information of the candidates, the user may request that thecandidates be surveyed using questionnaires, or be asked to performspecific test tasks intended for evaluating technical competence,non-technical aptitude, interest level and other criteria. After thequestionnaires or test tasks are completed, the resulting additionalinformation is incorporated into entity knowledge base 10 to allowfurther refinement of the candidate's scores. Where appropriate, thedata collected of each candidate may be made available to all users.

It is expected that the scores generated by a recruitment tool of thepresent invention be instrumental to the hiring decision. Thus, hiringdecisions, whether positive or otherwise, may be used to improve systemperformance. For example, core engine 121 may be trained usinghistorical “screening and hiring decisions 123”. The training processallows core engine 121 to recognize patterns in the candidate selectionprocess, even specific to a particular user, to provide better accuracyand a more positive user experience. The training process may beachieved using conventional machine-learning and testing techniques 124and 125. Improvement in performance based on machine-training techniquesmay be shared across users.

Access control, account management, and other administrative functions117 may be implemented to ensure privacy and integrity. Billing andpayment functions 118 may also be implemented. The system may alsointerface with external software through, for example, applicationprogram interfaces.

In FIG. 1, activities shown within box 20 may be carried out on-line(i.e., interactively with a user or candidate through a graphical userinterface). These on-line activities may include resume parsing inresume parser 111 and job opening parsing in job parser 112, candidatescoring and ranking 122, and on-line questionnaire interaction 116 witha candidate. Activities shown in box 30 may be considered “offline”activities, i.e., activities that are performed without interaction witha user or a candidate. Such activities include web crawling in webcrawler 119, machine model retraining 124 and testing 125 may be run inthe backend, either automatically or in ad hoc fashion.

According to one embodiment of the present invention, data collectedfrom candidate CVs or resumes may include contact information (e.g.,email addresses, telephone numbers, postal addresses), educationbackground (e.g., universities or schools attended, academiccredentials, including degrees obtained, grade point averages andscholarship awards), work and other experiences (e.g., industrycompanies or academic institutions worked for, full-time or part-timepositions held, previous job titles, tenure, and responsibilities),relevant skills, list of publications, patents held, leadership andsocial involvements, and professional memberships. Such data may beaugmented using candidate-provided links to external sources ofprofessional information, such as LinkedIn and Github accounts. Forexample, as an indicator of the candidate's technical skill set, one maycollect the number of contributions in the candidate's GitHub account,with different weights assigned to repositories of different popularity.

Data collected form job opening descriptions may include the companyposting the job opening, job title, job location, responsibilities,required or desired skills, and highlighted keywords. Highlightedkeywords are keywords supplied by the user to indicate to the systemcertain pieces of information that should be accorded greater weight.For example, if a company heavily uses certain programming languages orsoftware packages, highlighted keywords may be, for example, C++,python, C# etc.

In addition to data collected through CVs and resumes, additional datamay be collected through interaction with a candidate over a userinterface. Such data may include specific skills, educational backgroundor industry experience the candidate would like to highlight, and thecandidate's connections and endorsements. Correlation of the candidate'sconnections and endorsements with the reported work experience may beuseful to validate the candidate's rating.

In one embodiment, a non-technical survey is conducted with a thecandidate to elicit personality traits (e.g., active or passivepersonality), whether or not the candidate is open to a contractorposition, as opposed to an employee position, the candidate'swillingness to relocate, the profile of the company sought, thecandidate's salary expectation, and the candidate's legal ability towork (e.g., visa status).

In one embodiment, the system collects additional information from theworld-wide web, using web-crawling or data-scraping techniques. Suchadditional data includes information regarding the universitiescandidates attended (e.g., prestige, ranking of specific academicprograms, specific degrees awarded etc.). To help evaluate thesubstantiality of a candidate's experience, for example, such data mayalso include company profiles, ranking, corporate reputation or culture,and size. Company profile data may be collected from, for example,Global public 2000 companies by market size, US largest privatecompanies, Largest startups by valuation, etc. Other information thatmay be of value include salary surveys, as correlated with H1Bsponsorship (available from, e.g.,http://www.flcdatacenter.com/Download.aspx), and with region andoccupation (available from, e.g., http://www.bls.gov/bls/blswage.htm.Other helpful information that may be collected for evaluation ofsuitability for an job opening may be, for example, a company's rating(available, e.g., Glassdoor.com) and other indicia of a company'sreputation. To evaluate the relevance of a candidate's skills andexperience in certainly industries or markets (e.g., foreign markets,such as China), data may be sourced through data partnership or othersources (e.g., crowd sourcing).

The entity-based knowledge graphs encompass all entities in entityknowledge base 10. Examples of entities include candidates,universities, academic institutions and schools, academic programs(e.g., Physics Graduate Program at Stanford University), industries(e.g., software engineering, data science), companies and jobs. Theentities in the entity-based knowledge graphs are linked by edges thatcapture the relationships or interactions between the entities. Theserelationships represent facts (e.g. the candidate's alma mater, thedegree or degrees received, the company the candidate is currently with,and the current title), the probabilities that the candidate possessesspecific skills (i.e. the likelihood that the candidate is proficient ina specific programming language), the probabilities of the candidatebeing desirous of specific jobs, and the probabilities that the companyhaving the job opening is desirous of a person having specific personaland professional traits. For example, such data captures relationshipsthat would the system to conclude that company A hires candidates fromtop-tier MBA graduate programs 85% of the time for job C. Theentity-based knowledge graphs are periodically updated, so as to reflectthe latest status of the entities and the interactions among them.

In order to properly and accurately capture all relationships andinteractions among entities in the entity-based knowledge graphs, adomain-specific taxonomy is developed. For example, the system iscognizant that “Experience with Oracle SQL, Microsoft SQL Server andMySQL” may be treated in most respects the same as “SQL experience.”Similarly, the system is cognizant that “Object-oriented programminglanguages” includes “Python”, “C++”, “Java”, etc.

The entity-based knowledge graphs allow features to be constructed thatrelate a candidate to a job. These features allow predictive models tobe built, using regression, random forest and other suitable data-drivenlearning techniques to estimate the fit between the candidate and thejob. Some example features include (a) academic credentials (e.g.,numerical values may be assigned to B.S., M.S. and Ph.D. degrees); (b)number of years of professional experience; (c) similarities betweencurrent job responsibilities and the responsibilities specified in thejob description (e.g., based on keyword and semantic matching); (d)quality of the alma mater (e.g., different numerical values may beassigned to different universities, which may be grouped into tiers);(e) difference between the candidate's current salary and the salaryrange offered in the job description; (f) number of years the candidatestayed at each previous job; and (f) number of years of experience ineach skill highlighted by the user.

The system may also use these features to calculate a measure ofsimilarity (“distance”) between candidates. Accordingly, the systemprovides a “lookalike candidate” feature to include or excludecandidates to be recommended for a job opening. When a user indicatesthat a candidate is a “strong fit” or “weak fit” for a job, the systemmay use that candidate as a reference to compute a distance between thatcandidate and each candidate in the candidate pool. The candidate with asmall distance to the reference candidate may have his or her rankingupgraded or downgraded for the specific job opening, according towhether the reference candidate was rated as a “strong fit” or “weakfit,” respectively. A user's indication of preference or disfavor helpsthe system to quickly train the system to learn the user's specificpreference or disfavor, thereby improving the effectiveness of therecommendation. The distance measure may be based on a single feature,e.g., university education, the system may recommend another candidatewho attends the same university and graduated from the same program. Fora distance measure based on multiple features, the system may use a“weighted cosine similarity metric.” For example, assuming the features“salary” and “number of years of experience” of two candidates A and Bare represented by the tuples (s_(A), e_(A)) and (s_(B), e_(B)),respectively, and these features are weighted w_(s) and w_(e), then thedistance measure, using the weighted cosine similarity metric, would begiven by

$\frac{{w_{s}s_{A}s_{B}} + {w_{e}e_{A}e_{B}}}{\sqrt{{w_{s}\left( {s_{A}^{2} + s_{B}^{2}} \right)} + {w_{e}\left( {e_{A}^{2} + e_{B}^{2}} \right)}}}.$

The values s_(A), e_(A), s_(B), e_(B), ws and w_(e) are suitablynormalized values, using normalization techniques familiar to those ofordinary skill in any of the fields of machine learning, andprobabilities and statistics.

The system may offer online skill or competence testing to moreaccurately evaluate a candidate's technical proficiency. Results of thetesting are fed into the machine-learning algorithms, together withother information that is gathered programmatically from the candidate'sresume, LinkedIn profile, and other online activities. For example, oneembodiment provides tests that cover essential technical skills that arerequired in data science, software engineering and other related fields.The tests may focus, for example, on real-world problem solving andunderstanding of fundamental concepts (e.g., statistical significanceand computational cost), which are known to be critical to careersuccess in such fields. Such tests are invaluable to obtain skill andcompetence data that is not available in relatively quantified form fromthe candidate's resume or his or her LinkedIn profile. Examples of areasin which such tests are appropriate include: proficiencies with SQL,Python, statistics, Hadoop, C++, Java, and Ruby. In one embodiment, thetests are designed to be: (a) light-weighted, i.e., each test mayconsist, for example, of 15 or less multiple-choice questions, with anappropriate time limit (e.g., 15 minutes); (b) easily accessed (e.g., acandidate may elect to take such a test from a desktop computer or amobile phone at any time, and wherever he or she finds convenient); (c)flexible (e.g., a recruiter or hiring manager may specify for thecandidates which test or tests to take, deemed most relevant to the jobrequirements; and (d) available (i.e., the test results are stored inthe system for a relevant time period, and are made available to allrecruiters selected by the candidate.

The system may also compile insightful, detailed summary of thecandidate's performance on the tests including, for example, how thecandidate ranks relative to his or her peers, as well as the areas ortopics in which the candidate performed well. In one embodiment, thesummary report may read: “This candidate ranked the 86^(th) percentilein statistics, and demonstrated good knowledge of probability, sampling,and experiment design . . . .”

Suitable security features are implemented in the system to preventcheating or other fraudulent actions (e.g., a candidate having anotherperson take a test). Suitable security measures require a candidate tosubmit adequate identification to prevent fraud (e.g., a biometricsignature).

FIG. 2 is a functional block diagram showing the major functionalmodules in system 200, in accordance with one embodiment of the presentinvention. As shown in FIG. 2, system 200 includes core engine 201,which may be software carrying out the core functions of system 200,including matching candidates to available jobs. Core engine 201 alsoconstructs and maintains the entity-based knowledge graphs in the entitygraph module 202. In one embodiment, as candidate data (e.g., a resume)or job description data is received or uploaded, core engine 201 tagsthe data with one or more relevant job classifications (based on thedomain-specific taxonomy) to allow subsequent efficient processing. Inthis manner, candidates and job data classified to a specific jobclassification and related classifications may be very efficientlyidentified and processed. Providing such pre-processing allows system200 to be scalable as the managed data grows.

Entity graph module 202 includes data organized by entities andrelationships relating the entities. As discussed above, entities maybe, for example, candidates, work places, job titles, educationalinstitutions, degrees, school courses, projects, locations, computerlanguages, and so forth. The relationships may represent facts (e.g. thecandidate's alma mater, the degree or degrees received, the company thecandidate is currently with, and the current title), the probabilitiesthat the candidate possesses specific skills (i.e. the likelihood thatthe candidate is proficient in a specific programming language), theprobabilities of the candidate being desirous of specific jobs, and theprobabilities that the company having the job opening is desirous of aperson having specific personal and professional traits. Core engine 201may retrieve from or save into entity graph module 202 datacorresponding to any subset of entities and relationships.

Core engine 201 also manages recruiter web or mobile portals 203(“recruiter portals 203”) and candidate web or module portals 204(“candidate portals 204”). Through recruiter portals 203, a user mayupload job descriptions and candidate CVs and resumes, review job andcandidate data from the user and other sources, provide user-specificcandidate preference and other data, access third party tools, andreview recommendations of candidate-job opening matches from core engine201. Core engine 201 also provides through recruiter portals 203additional data helpful recruiters (e.g., suggested job descriptiontemplate and key phrases to be added to the user-provided jobdescriptions).

Through candidate portals 204, a candidate may upload his or her resume,and authenticated his or her professional and personal data that coreengine 201 obtains from third party applications (e.g., LinkedIn,Facebook, and other social and professional sources). Core engine 201also administers technical competence tests through candidate portals204. Through candidate portals 204, a candidate may examine his or hermatches to specific job openings recommended by core engine 201, andother employment related data (e.g., how the candidate matches up to hisor her peers in similar jobs, similar industries, similar locations andother parameters.

In some embodiment, a plug-in may be provided to a web browser that isused to access recruiter portals 203 and candidate portals 204. Theplug-in provides access to the functions that are specific to coreengine 201. For example, the plug-in allows a user to access inlineinformation about a candidate from any website on which the candidate'sname appears.

Core engine 201 also interfaces with third party applications throughthird party integration module 205. In one embodiment, third partyintegration module 205 provides core engine 201 access to such systemsas an applicant tracking systems (“ATS”), job boards, tools that focuson candidate sourcing (e.g. Entelo, Piazza, etc), a human resourcemanagement system (HRM), and other systems providing additional data(e.g., candidate profiles, feedback on candidates, and recruiterpreferences). In addition, third party integration module 205 may sharedata maintained by core engine 205 with third party software throughthird party integration module 205. Integration with an ATS allowstracking of candidates through the hiring process. Integration with jobboards allow access to additional candidate profile data and tracking ofthe jobs on each job board that a candidate may have applied.

In one embodiment, core engine 201 receives data from one or more webcrawlers and data scrapers, represented in FIG. 2 by data scraper 206.Similar to integration with third party applications, integration with aweb crawler or data scraper allows users and data partners to accessadditional data for enhancing the entity-based knowledge graphs. Forexample, through third party integration module 205, core engine 201accesses candidate profile data from social and professional datarepositories (e.g. Facebook, LinkedIn, Github, and Quora) and otheronline databases (e.g. salary data from H1B gov website and Glassdoor).Such third party data may be gathered and aggregated to augment buildingand refinement of the entity-based knowledge graphs.

The above detailed description is provided to illustrate specificembodiments of the present invention and is not intended to be limiting.Numerous variations and modifications within the scope of the presentinvention are possible. The present invention is set forth in theaccompanying claims.

We claim:
 1. A system for management of recruitment data, comprising: aninterface for receiving and providing over a wide area computer networkdata regarding job openings and data regarding candidates to be matchedto such job openings; a database for storing the data regarding jobopenings and the data regarding the candidates, the database beingorganized according to one or more entity-relationship models; and acomputing hardware platform for executing a processing engine that ismachine-learned from the data regarding job openings and the dataregarding candidates, wherein the processing engine (a) creates theentity-relationship models over time; (b) manages the interface toreceive the data regarding job openings and the data regardingcandidates and causing the received data to be stored in the database;(c) matches candidates whose data are currently in the data base to jobopenings currently in the database; (d) receives historical dataregarding actual filling of job openings in the database by candidatesin the data base; and (e) refines the entity-relationship models and thematching of current candidates to current job openings based on thehistorical data.
 2. The system of claim 1, wherein the interfacecomprises one or more servers for maintaining one or more web portalsfor access by users over the wide area computer network.
 3. The systemof claim 2, wherein the web portals comprise a web portal customized foruse by recruiting professionals.
 4. The system of claim 3, wherein theweb portal customized for use by recruiting professionals receivesuploads of job openings and candidate profiles, and provides to therecruiting professionals the matching of candidates in the current database with job openings in the current data base.
 5. The system of claim2, wherein the web portals comprise a web portal customized for use bycandidates to job openings.
 6. The system of claim 5, wherein the webportal customized for use by candidates administers on-line technicalcompetency tests to the candidates.
 7. The system of claim 2, whereinthe web portals receiving uploading of candidate resumes, wherein theweb portals each comprise a parser for identifying the data regardingcandidates from the candidate resumes.
 8. The system of claim 2, whereinthe web portals receiving uploading of job opening descriptions, whereinthe web portals each comprise a parser for identifying the dataregarding job openings from the job opening descriptions.
 9. The systemof claim 1, further comprising a third party integration module forallowing data to be obtained or to be provided to third party programs.10. The system of claim 9, wherein the third party programs comprise atleast one of: one or more applicant tracking systems, one or morecandidate sourcing systems, and one or more sources of professional andpersonal data.
 11. The system of claim 9, wherein a portion of the dataregarding the candidates is obtained from third party programs.
 12. Thesystem of claim 1, further comprising a data scraper that provides thesystem data regarding candidates through exploration of informationavailable on the wide area network.
 13. The system of claim 1, whereinthe processing engine further recommends candidates to fill a jobopening based on learned user preferences.
 14. The system of claim 13,wherein the user preferences relative to job are learned from a user'sratings of one or more candidates matched to the job opening.
 15. Thesystem of claim 14, wherein the processing engine recommends candidatesto the job opening based on a distance measure based on one or morecharacteristics of each candidate to be recommended and thecorresponding characteristics of the one or more rated candidates.